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Abstract 

This paper describes an attempt to combine 
the advantages of both example-based transla- 
tion and stochastic translation methods in an at- 
tempt to develop a method for inferring symbolic 
transfer functions from a bilingual corpus. By 
formalizing the translation process and applying 
standard optimization techniques, a system can 
be developed that will identify grammatical cat- 
egories and produce coherent transfer functions 
between languages. The validity of this approach 
is demonstrated in a prototype system that can 
learn transfer functions between English, French, 
and Urdu. 

Keywords: Hybrid approaches, corpus-based 
NLP, machine translation 

Introduction 

The development of symbolic machine translation 
systems is difficult and expensive, but the devel- 
opment of non-symbolic MT systems is typically 
not much easier or less expensive. An ideal MT 
system would be able to to identify the structure 
of the source and target languages without the as- 
sistance of human enginsers, but at the same time 
be easily understood, corrected, and updated to 
reflect new information and situations. For the 
second, a symbolic approach is almost a neces- 
sity, as subsymbolic or completely automatic in- 
ference systems tend to produce nearly opaque 
sets of numbers as their "outputs." However, the 
process of describing human languages in sym- 
bolic terms is difficult and knowledge-intensive. I 
describe here a prototype system for deriving lin- 
guistically plausible and understandable transfer 
functions from paired corpora without the neces- 
sity of human intervention or preanalysis. 



Background 

The major engineering bottleneck to machine 
translation, in general, is the development of the 
knowledge base, such as the linguistic analysis 
tools and the bilingual dictionary. The costs of 
developing dictionaries or linguistically sophisti- 
cated parsers is comparable to the cost of devel- 
oping an expert system. Many researchers have 
looked for tools that could be used to automati- 
cally develop an analyzer for the source language 
(or a generator for the target language) from sam- 
ples, or for that matter, a method of developing 
a bilingual dictionary from samples. 

Nagao(Nagao, 1984) proposed just such an an- 
alytical method with his proposal of what would 
become known as "example-based translation." 
Instead of encoding explicit transfer rules, trans- 
lation systems would store a collection of trans- 
lation examples that would provide coverage in 
context for the input. A novel sentence would 
then be compared against the sentences most sim- 
ilar to it in the database, and the similar compo- 
nents would be combined into a translation. Al- 
though the notions of "similar" and "component" 
still require a fair amount of human engineering, 
systems have been built(Sumitaet al., 1990; Sato, 
1991) which show good results without an explicit 
dictionary. 

Another approach(Brown et al., 1990) that 
avoids the knowledge acquisition bottleneck uses 
a huge bilingual corpus to attempt to solve 
the translation problem as a mapping between 
Markov chains. The researchers treat every sen- 
tence as generated by a Markov process and then 
consider every sentence to be a possible trans- 
lation of every other sentence. The "transla- 
tion" is simply the most probable translation of 
a given source sentence, Although this approach 
avoids the knowledge acquisition problem and can 
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achieve good results, the grammar and analysis 
techniques are psychologically implausible. 

Self-Organizing Automatic 
Translation 

The following work rests largely on the work of 
Green and others(Green, 1979; Morgan et al,, 
1989; Mori and Moeser, 1983) and their results 
on the psycholinguistics of language learning and 
the inherent structure of natural language. In 
particular, their research indicates that grammat- 
ical constructions are typically marked at surface- 
level by the appearance of a closed-class word or 
morpheme. A transparent example of this can be 
seen in Japanese. The basic sentence structure 
of Japanese is subject-object-verb. However, the 
noun phrase that fills the subject role is specifi- 
cally marked by the appearance of a particle such 
as 'wa' or 'ga' immediately following. Similarly, 
the object noun phrase is marked by the particle 
'o.' Similarly, in English, a noun phrase ("a noun 
phrase") is typically structured as a determiner 
or quantifier, a possible set of modifiers, and then 
the head noun itself. The determiner strongly sig- 
nals the beginning of a complete noun phrase. If 
a prepositional phrase follows, it is marked by 
the appearance of a preposition, and then an- 
other noun phrase with its attendant structure. 
Such markers have not only been attested cross- 
linguistically, but have been shown to greatly help 
language learning in psychological experiments. 
This set of observations together constitute the 
Marker Hypothesis. 

This hypothesis may provide a way to cast 
translation problems into a form solvable by stan- 
dard multivariate optimization techniques. If lan- 
guages are assumed to be largely compositional 
and grammatically structured, most of the pre- 
liminary work in translation is to identify the 
grammatical structures and corresponding struc- 
tures in the target language as well as to find the 
appropriate expression of those structures. These 
marker constructions can provide powerful cues 
to tl se structures. Using a rewrite-rule formal- 
ism defined below, one can define the translation 
problem ?•■ the problem of simultaneous identifi- 
cation of : 

• the grammatical organization of the source lan- 
guage 

• the changes necessary to convert the grammar 
of the source language into that of the target 
language (which in a CFL can be done by per- 
mutation operations) 

• the "dictionary" : a table lookup expressing 
the most appropriate translation for each gram- 
matical element in the source language. 



The input to this system is simply a set of 
paired sentences in the source and target lan- 
guages. The only preprocessing done is the com- 
pilation of a list of the tokens appearing the 
the input data, to simplify the construction of 
a source-target dictionary. Unlike (Sumita et al., 
1990) et al., there is no extensive semantic pro- 
cessing, so the development of a new data set for 
a new language pair requires only data entry from 
an existing body of translated text. The output 
of this learning, of course, is a description of the 
translation algorithm as defined above. 

The Formalism 

(Juola, 1994) demonstrates the existence of a 
normal form for context-free grammars (CFGs) 
which incorporates some of the properties of the 
Marker Hypothesis as described above. Specifi- 
cally, grammars in marker-normal form have all 
rules in one of the following forms : 

A — e 

A -» a 

A -» bBcCdD- • • 
and therefore all non-terminal constituents are 
marked by a terminal symbol. This template thus 
gives a skeletal grammar-scheme that can be used 
to describe large classes of grammars in terms of a 
relatively small number of parameters (how many 
rules are there, what terminal symbol appears at 
position i of rule .?', what non-terminal symbol 
appears at position i + 1 of rule j, and so forth). 

Given a filled set of these parameters, a closely- 
related grammar can be used to partition sen- 
tences into their constituents. In the third ex- 
ample above, the initial A-phrase would be parti- 
tioned into the "phrases" (bB) (cC) (dD)- • • and 
each phrase would be recursively parsed using ad- 
ditional rules or sets of rules. 

Any phrase (typically a single-word phrase) 
which cannot be further broken down by this pro- 
cedure will be treated as a unit for purposed of 
translation. Such lexicalized units can simply be 
looked up in a dictionary and replaced by a corre- 
sponding target word, most often a direct trans- 
lation. These translations are used as the basic 
blocks for constructing the target sentence. The 
entire process can be learned and run without hu- 
man intervention, but is understandable enough 
to permit human analysis and correction if nec- 
essary. 

The Learning Algorithm 

If there is such a thing as a "most general" prob- 
lem in computer science, one candidate is cer- 
tainly non-linear multivariate optimization. Al- 
most any task of interest can be described as an 
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equation or a set of equations over some complex 
mathematical space. The reduction of boolean 
logic to algebra is well-known, and symbolic (dis- 
crete) tasks can be easily converted to continuous 
search spaces by incorporation of an additional 
term that adds large amounts of error to a sys- 
tem when the parameters are "distant from" the 
desired discrete numbers. Because of this simple 
reduction, any semi-mathematical learning situa- 
tion can be easily cast as a task of simultaneous 
parameter estimation. 

Within this learning framework, it is easy to 
turn the individual components of the translation 
system described into semi-numerical optimiza- 
tion tasks. For example, most of a dictionary is a 
simple word-to-word mapping. With some gross 
simplifying assumptions, one can simply guess 
the entire dictionary mapping and then tune the 
guess, one entry at a time, to reflect the actual 
contents of the dictionary. Words with multiple 
definitions (for example, 'you' can be translated 
as 'tu' or 'toi' depending upon its grammatical 
role) can be handled by simply learning a large 
number of mappings to be used in the "appro- 
priate" contexts. One-to-many and many-to-one 
translations (for example, 'not' to 'ne/pas') can 
be handled by accepting the null string as a po- 
tential word, and thus insertions and deletions 
can be done in certain specific contexts. 

As has been stated above, the task of restruc- 
turing the source parse tree to reflect target struc- 
ture can be performed by a simple permutation 
of the daughters at a given node. The task of 
learning a specific minimal-cost permutation has 
been well-studied in optimization theory as the 
"Traveling Salesman Problem." This can also be 
solved by a guess-and-tune algorithm where two 
cities (or constituents) are swapped at each iter- 
ation and compared against the desired results. 
Although in theory each rule of the source gram- 
mar will have a permutation to learn, there are 
few constituents per rule and the problem should 
still be tractable. 

The hardest subproblem to cast into this op- 
timization framework is the acquisition of the 
grammar. As has been discussed above, the exis- 
tence of marker words makes the task much eas- 
ier. If every constituent is marked by the ap- 
pearance of a closed-class word, then by identify- 
ing the relevant marker words, one can segment 
the sentence into its constituents and parse recur- 
sively. Some work(Smith and Witten, 1993) has 
been done on sophisticated methods for identify- 
ing such words, but for this prototype system, the 
input vocabulary was small enough that the sys- 
tem could simply acquire the marker word set and 
all words are therefore potential marker words. 



The acquisition proceeds as : for each rule in the 
grammar, one must identify the subset of marker 
words which are relevant to that rule (for exam- 
ple, determiners are relevant to noun phrases) and 
the classes ofthe constituents themselves (for ex- 
ample, 5-+NV). Guess-and-tune will obviously 
serve for subset selection (each marker word is 
either in or out of the relevant set and can be 
changed individually), and can be made to serve 
for class-identification by enumerating all poten- 
tial non-terminal symbols and allowing the sys- 
tem to select among them for every symbol on 
the right hand side of a skeletal grammar. Given 
this grammar, the system can use it to parse and 
translate selected sentences from the training set. 

Once the numerical framework is in place, one 
has a free hand in selecting the learning algo- 
rithm itself; the work described below uses a sim- 
ple variant of simulated annealing. Originally a 
model of crystal growth, simulated annealing has 
the advantage of being well-known, well-studied, 
and relatively uncontroversial as an optimization 
technique. This technique is a modified version of 
a simple random walk through the dataspace of 
interest. The system starts out at some location 
in the dataspace (the guess) and makes modifi- 
cations, at random, to the parameter set. At 
each stage, the new parameter set is measured 
against the old parameter set, and if it results 
in improved performance, the old parameter set 
is discarded and replaced (the tuning). Even if 
the new set results in reduced performance, the 
old set may be discarded and replaced if the re- 
duction isn't too bad. As the tuning progresses, 
the notion of "too bad" is gradually tightened 
until the system accepts only improving changes 
and will eventually find the global optimum. De- 
spite the simplicity of this algorithm ( "It guesses 
some random grammar and then looks to see if it 
works well enough. If not, it randomly changes 
something to see if that makes it better." 1 ), sim- 
ulated annealing is actually a relatively power- 
ful and efficient technique for nonlinear optimiza- 
tions. Other experiments are in progress using 
other optimization techniques such as genetic al- 
gorithms, but the results have been largely similar 
to the work presented here and so are omitted for 
brevity. For interim evaluation as required for er- 
ror measurement, the system uses an edit-graph 
(diff, to UNIX programmers) formalism(Myers, 
1986) as an approximate measure of the amount 
of work that would be necessary to convert the 
results of the translation into the actual target 
sentences from the example database. 



1 James Martin, University of Colorado, personal 
communication of July 28, 1994 
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The Translation Algorithm 

Once a skeletal transfer function has been gener- 
ated, and particularly at the conclusion of the 
learning phase, the system should have a list 
of numerical parameters that describe the ac- 
tual transfer process. A sentence to be trans- 
lated is first parsed by the marker-normal form 
grammar generated above. All leaf nodes are 
replaced by their translations as defined by 
the context-sensitive dictionary representing their 
parent class. The system then proceeds to re- 
structure the parse tree in strict bottom-up fash- 
ion. The children of a particular parent are re- 
ordered in keeping with the permutation func- 
tion, and then the children's words are concate- 
nated in^o a constituent phrase which replaces the 
(suh)tree of the parent. 

The parent,, then, is now a leaf and a child in its 
o vn right for another node, and its phrase will be 
incorporated into a larger phrase until the entire 
tree has been reordered and flattened into a trans- 
lation of the entire source sentence. This sentence 
can either be presented to the user or compared 
against the target sentences in the example base 
for evaluation. 

Experiments 

The above formalism has been tested in a se- 
ries of experiments designed to determine the 
strengths and limitations of the approach as well 
as the most efficient algorithms to use for error 
metrics and optimization techniques. I report 
here on the two most interesting from a linguis- 
tic/theoretical point of view. In the first experi- 
ment, we created artificial context-free grammars 
covering interesting and relatively complex sub- 
sets of French and English (including sentential 
complements, relative clauses, and gender distinc- 
tions), and attempted to learn the appropriate 
transfer functions from English to French based 
on a twenty-nine sentence artificial corpus devel- 
oped and tested against the judgements of a na- 
tive speaker of French. I then tested the sys - 
tem both against the training corpus and against 
novel sentences that had not been seen in the 
training phase. 

The other experiment was designed as an at- 
tempt to model a "situated cognition" task, and 
to compare the performance of the system against 
a comparable task for humans, that of learn- 
ing a second language (Urdu) by exposure to in- 
structional documents. In particular, the system 
was given the vocabulary and grammar examples 
from a particular sample of instructional text (les- 
son two from (ur Rahman, 1958)) and told to 
learn appropriate transfer functions. As above, 
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the resulting system was tested against both the 
training corpus and novel sentences (the exercises 
from the same lesson). 

Evaluation 

The problem of evaluating translations, whether 
human or machine, is difficult indeed. The fol- 
lowing sections provide examples of two sorts of 
evaluations done on the system. The first, a black 
box examination, is a simple comparison of the 
system's results with the correct translation on 
novel material. For most stochastic MT systems, 
this is the only type of analysis that can be done, 
because the tables of probabilities are, to all prac- 
tical purposes, opaque. This level of analysis typ- 
ically will show the presence of errors, but not 
necessarily the source, the method of correction, 
or even what sorts of training data should be in- 
cluded to reduce them. However, because the sys- 
tem uses the symbolic information incorporated 
into the marker hypothesis (as well as a more lin- 
guistically plausible description in the form of a 
CFG), one can also do a glass box examination 
of the details of the translation functions and use 
this additional information to identify the precise 
source (in many cases a single decision) of errors 
and correct them. 

Black Box Evaluation 

In the English— >-Urdu experiment, the training 
data consisted of a vocabulary list for both lan- 
guages and a set of seven sentences providing 
coverage for copula-locatives and imperative sen- 
tences. The training set was learned without er- 
rors, and the system was then tested on the exer- 
cises incorporated into lesson two from (ur Rah- 
man, 1958). Except in cases where an individual 
lexical item had not been included in the training 
data, the test set was also translated perfectly. 
Expansion of the training data to include the 
complete system vocabulary in context resulted 
in a perfect transfer function for the constructions 
studied. 

As is to be expected, the higher syntactic 
complexity of the English-+French data reduced 
system performance considerably. I performed 
this experiment six times, using different random 
seeds and annealing schedules. Over several tri- 
als, the system was usually able to reproduce be- 
tween 30% and 70% of the sentences correctly. 
Typical error patterns were instructive, however. 
The system usually identified the verb correctly, 
and the most common errors were context sen- 
sitive deletion errors, for instance of a pronomial 
direct object, a reflexive particle or a complemen- 
tizer. Over the repeated experiments, only 5% of 
the sentences were translated as gibberish. 
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Trials on novels sentences seem to bear this 
out. The experiment with the longest amount 
of learning time was selected for further analy- 
sis, and presented with novel sentences developed 
by native English speakers from the system's vo- 
cabulary without knowledge of the experimental 
results. These sentences were post-edited to limit 
the sentences to syntactic forms present in the 
training data (for example, the system only saw 
"glass" as a noun, not an adjective) and presented 
to the system without further modification. The 
system correctly translated 46% of the novel sen- 
tences, and produced the minor deletion errors 
or simple lexical production errors on another 
8%. (These numbers can be compared, respec- 
tively, with the 51% perfect and 44% minor errors 
that this particular trial achieved on the training 
data.) In particular, the system had the most dif- 
ficulty with the areas where the French structure 
is the most different from English; in the prever- 
bal (pronomial) direct and indirect objects, the 
non-optional sentence complementizers, and re- 
flexive particles. 

Glass Box Evaluation 

A major advantage of this approach is that it is 
possible to directly read a grammar and transfer 
functions from the output of the simulated an- 
nealing. This allows us to explain the behavior 
of the system at a conceptual level appropriate 
to the task at hand. For example, by examining 
the set(s) of words used as grammatical markers, 
one can identify which words the system felt were 
used in similar contexts and identify mistakes in 
class-identification. This allows us (or any user) 
to provide human guidance at any point in the 
learning cycle. 

As an example of this approach, I will conduct 
a partial evaluation of the English-* French trans- 
fer function discovered by the system. The fun- 
damental unit for analysis should be a sentence, 
e.g.: 

the man washes the car 
The system attempts to translate this sentence by 
noting that words like 'washes', 'falls', 'creates', 
etc.. comprise a class, and that everything be- 
fore the appearance of the first word of such type 
should be translated as a unit. A similar process 
identifies the words 'a', 'the', 'this', and 'that' as 
another class which divide the third utterance of 
the sentence from the second. 

(the man) (washes) (the car) 
These utterances are themselves translated 

(le homme) 2 (lave) (la voiture) 

a The process that converts le homme to I'homme is 



and then permuted (via the identity permutation) 
and concatenated to form the French translation. 

Upon this level of analysis, the reasons for cer- 
tain types of errors are clearly shown. For ex- 
ample, the identity permutation learned above 
is applicable in some instances but not in others 
(pronomial objects, for instance), merely reflect- 
ing the most common rule in the training set. A 
more sophisticated system would have the capa- 
bility of applying different permutations depend- 
ing upon the relevant utterances. It may be im- 
practical on a large scale to list the verbs in a 
language explicitly, and some form of on-line tag- 
ging(Cutting et al., 1992) may be useful. The 
most serious flaw is that the classes learned are 
imperfect (for example, 'woman' is mis-classified 
as a verb, resulting in a serious mistranslation of 
the sentence u *(the) (woman washes) (the car/'), 
but these are easily identifiable and correctable by 
more sophisticated search algorithms or by hu- 
man intervention as necessary. The error here 
is more sophisticated than merely garbage in the 
partition. Because the system is allowed to reuse 
the partition classes to allow for recursive produc- 
tions, occasional local maxima are found where 
two classes are parsed and translated by the same 
function — in the case of this particular error, the 
set described above as "verbs" is also used to dis- 
tinguish between masculine and feminine nouns 
so that articles can be correctly translated. 

Some translations as performed by the METLA 
system are attached as tables 1 and 2. These ta- 
bles show, for each language pair, sample source 
sentences, their initial division into components, 
their translated components, and the final sen- 
tence as translated. As can be seen, the Urdu 
sentences are not only translated perfectly but 
also broken down into logical and linguistically 
cogent categories such as subjects, objects, and 
prepositional phrases. 

The results for the French are not as uniformly 
positive, but for each of the incorrect sentences, 
it is possible to identify and explain the source of 
the errors. For example, the division of the third 
sentence is incorrect — "the man that touches the 
car" is an entire component and the main verb of 
the sentence is the second token of 'touches.' This 
is an artifact of the admittedly broken METLA- 1 
parsing algorithm, which divides at the first ap- 
pearance of a given token. That this sentence is 
correctly translated at all is a tribute to the re- 
markable structural similarity between this sen- 
tence and its French translation. The fifth sen- 
tence is an example of a so-called "reflexive" verb; 



almost purely phonetic, and was ignored for purposes 
of this experiment. 
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the man is in the shop 
((the man) ((is) (in the shop))) 
((admi) ((hai) (dukan men))) 

admi dukan men hai 

bring the letter from the shop 
(bring) ((the letter) (from the shop)) 
(lao) ((chitthi) (dukan se)) 

chitthi dukan se lao 

wait in the office 
(wait) (in the office) 
(thairo) (daftar men) 

daftar men thairo 

put the box on the table 
(put) ((the box) (on the table)) 
(rakho) ((sanduq) (mez par)) 
sanduq mez par rakho 

Table 1: Sample English— >Urdu translations with 
partial analysis 

the proper translation should be "ce chat se lave," 
where 'se' is a general pronoun meaning 'self.' in 
English, certain verbs can be intransitive when 
the subject and object of the verb are the same — 
for example, "I shave (myself) every morning," 
"I wash," and so forth. Some of these verbs, in 
turn, must be expressed with the reflexive particle 
in French but with an ordinary direct object oth- 
erwise. This leads, in turn, to another example of 
the multiple-necessary-permutation problem dis- 
cussed above. 

A similar analysis can be done for the early 
experiments in English— ►Urdu translation. In 
this case, the errors can be tied directly to the 
fact that the word "knife", although learned as 
a single lexical item, was never presented in con- 
text, and the system had no way of identifying its 
grammatical class. When presented as part of the 
test data, the system determined (randomly) that 
it was a determiner, received no evidence to dis- 
prove this during training, and mistranslated ac- 
cordingly. When the training data was extended 
to cover this case, the percentage correct rose to 
100%. 

It is important to not let the errors made by 
a prototype system overshadow the larger re- 
sults. Despite having no a priori knowledge of se- 
mantic/syntactic categories, the system correctly 
identified the important grammatical constructs 
most of the time and built appropriate transfer 
functions. The errors were easily identifiable by 
examining the set of "marker words" used in pars- 
ing. Adding a new language to the system re- 
quired only typing in the training data; no sys- 
tem modifications were required. If this approach 
scales well to larger corpora and vocabularies, 
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the glass touches a car 
(the glass) (touches) (a car) 
(le verre) (touche) (une voiture) 

le verre touche une voiture 

she washes a cat 
(she) (washes) (a cat) 
(elle) (lave) (un chat) 

elle lave un chat 

the man that touches a car touches a glass 
(the man that) (touches) (a car touches a glass) 
(ie horn me qui) (touche) (une voiture touche un verre) 

le homme qui touche une voiture touche un verre 
~ that man washes a car that she creates 

(that man) (washes) (a car that she creates) 
(ce homme) (lave) *(une voiture qui elle creee) 

*ce homme lave une voiture qui elle creee 

this cat washes 
(this cat) (washes) () 
(ce chat) (lave) () 
*ce chat lave 



Table 2: Sample English— (-French translations 
with partial analysis 

this should cut the tinje necessary to develop and 
maintain large MT databases by a large fraction. 

Obviously, many more experiments are re- 
quired. Although the Urdu experiments demon- 
strated the ability to identify and permute high- 
order structures (such as basic Greenbergian word 
order), further testing is indicated. In addition, 
the small artificial corpora used should clearly be 
replaced by aligned real-world corpora as used by 
(Brown et al., 1990). (Wu, 1994) describes an 
experiment in aligning parallel Chinese-English 
texts that would provide data for a non-Indo- 
European language in large enough quantities to 
provide a significant test. Using this or a simi- 
lar corpus should provide enough information to 
allow METLA to be improved considerably in fu- 
ture versions. 

Conclusions 

It may, at this point, be useful to revisit some 
of the differences between this work and some of 
the major projects in ERMT and statistical ap- 
proaches. This work does not exclusively focus 
on grammatical induction. Although grammati- 
cal induction is an important part of the task, nei- 
ther the problem (translation) nor the approach 
guarantees that the system will learn anything us- 
able for grammatically judgements. At the same 
time, the system would presumably be robust 
enough to translate malformed phone numbers 
without causing system errors. This is clearly 
an advantage in dealing with real-world input, 
where typographical errors and misphrasings arc 
not uncommon. At the same time, this system in- 
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eludes grammatical structure which should result 
in more robust, understandable, and linguistically 
plausible translation functions than the Markov 
chains developed by (Brown et al., 1990). 

Finally, although the system uses examples to 
develop its translation functions, there are several 
crucial differences between the proposed work and 
the more mainstream EBMT paradigm. First, 
other than the notion of paired sentences, there is 
no preanalysis of the translation database, which 
greatly reduces the load on the developers of the 
system. This system also produces a reduced 
database, explicitly extracting patterns rather 
than finding them as needed in on-line examples. 

These results suggest that induction of trans- 
fer functions from untagged, unanalyzed corpora 
is both theoretically and computationally viable 
as an approach for the development of machine 
translation systems. In particular, of computer 
time can be substituted for human time if the 
appropriate bilingual corpus is available, as it is 
for most major languages in instructional t?xt. 
The above formalism appears to work veil and 
produces results which can be impressive in their 
own right as well as easily modified and improved. 
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